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There has been much recent progress in the understanding and reduction of the computational 
cost of the Hybrid Monte Carlo algorithm for Lattice QCD as the quark mass parameter is re- 
duced. In this letter we present a new solution to this problem, where we represent the fermionic 
determinant using n pseudofermion fields, each with an n"" root kernel. We implement this 
within the framework of the Rational Hybrid Monte Carlo (RHMC) algorithm. We compare 
this algorithm with other recent methods in this area and find it is competitive with them. 
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The motivation for this work is the need for faster al- 
gorithms to perform lattice QCD calculations near the 
chiral limit. To include the effects of fermions in our cal- 
culations we are required to invert the Dirac operator, 
which is a very large sparse matrix: as the fermion mass 
is decreased the computational cost increases with the 
condition number of the matrix k(A^). Lattice QCD cal- 
culations involve the integration of Hamilton's equations 
of motion, and the fermionic force acting on the gauge 
fields also increases with decreasing fermion mass. To 
maintain the stability of the integrator the integration 
step size must be reduced, thus increasing the cost. In 
this paper we shall not address the first of these prob- 
lems, but shall introduce a method to bring the latter 
under control. This work is related to similar work by 
Hasenbusch but our method requires less parameter 
tuning. 

When performing a lattice QCD simulation, we desire 
gauge field configurations U distributed according to the 
probability density 



P{U) = le~^s(c/) det7W(C7), 



where is the pure gauge action and detAl is the de- 
terminant of the Dirac operator Al = (0 + m)^(0 -f m), 
which appears after integrating out the Grassmann- 
valued quark fields. The operator is the discretized 
covariant derivative, and m is the fermion mass. We 
represent the determinant as a pseudofermion Gaussian 
functional integral (detAl oc J d(\)d<f)'^ exp (—(/i^A^"-'^ (/))), 



giving the probability density 



P([/, 
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Almost all techniques for generating gauge configura- 
tions consist of variants of Hybrid algorithms, these be- 
ing algorithms which combine momentum and pseudo- 
fermion refreshment heatbaths with molecular dynam- 
ics (MD) integration of the gauge field. The latter 
is done through the introduction of a "fictitious" mo- 
mentum field TT, with which we define the Hamiltonian 
H = iTT^ -I- ScS- The gauge fields can then be allowed to 
evolve for a time r by integrating Hamilton's equations. 
Hybrid algorithms are ergodic and their fixed point is 
close to but not precisely the desired one. This discrep- 
ancy is due to the inexact integration of Hamilton's equa- 
tions: the Hamiltonian is conserved to 0{5t^), where 
k is determined by the order of the integration scheme 
used. Hybrid algorithms which use area preserving and 
reversible (symplectic and symmetric) integrators can be 
made exact through the addition of a Metropolis accep- 
tance test at the end of the MD trajectory, which stochas- 
tically corrects for these errors. The Hybrid Monte Carlo 
algorithm (HMC) algorithm is the de facto method for 
generating the required probability distribution, of which 
a single update consists of the following Markov steps 

• Momentum refreshment heatbath using Gaussian 
noise (/'(tt) oc /^). 

• Pseudofermion refreshment {P{4>) (x (0 + myrj, 
where P{ri) oc e^"^^"^). 

• MDMC, which consists of 

— MD trajectory consisting of t/5t steps. 

— Metropolis accept /reject with probability 
P_ = min(l,e-^^). 
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When updating the momentum there are contributions 
to the force from both the pure gauge part of the action 
dSg/dU, and the fermionic part dSt/dU. As the fermion 
mass is decreased the latter becomes the dominant con- 
tribution. To avoid an instability in the integrator and 
to maintain a non-negligible acceptance rate we must re- 
duce the step size 5t: this makes the computation very 
expensive since at every MD step we must solve a large 
system of linear equations. 

Since the fermion determinant is represented using a 
single pseudofermion configuration selected from a Gaus- 
sian heatbath, the variance of this stochastic estimate 
will lead to statistical fluctuations in the fermionic force: 
the pseudofermionic force may be larger than the exact 
fermionic force, which is the derivative dti:\nM.{U)/dU 
with respect to the gauge field U . This means that the 
pseudofermionic force may trigger the instability in the 
integrator even though the exact force would not. 

An obvious way of ameliorating this effect is to use 
n > 1 pseudofermion fields to sample the functional in- 
tegral: this is achieved simply by writing 

dctAl = [dctAfi/"]" 

n 

oc W d(l)j d(f>l exp (-0]A^-i/"0j) , 
i=i 

that is introducing n pseudofermion fields each with 
kernel M"!/". 

It is well known that the instability in the integrator 
is triggered by isolated small modes of the fermion ker- 
nel ^J. These modes are of magnitude 0(l/m^) with 
the standard kernel; with our multiple pseudofermion 
approach these modes are now 

0(l/m2)i/"^ and so are 
vastly reduced in magnitude. We would thus expect that 
the instability is shifted to occur at a far greater inte- 
grating step size. 

If we make the simple-minded estimate that the magni- 
tude of the fermion force is proportional to the condi- 
tion number of the fermion matrix multiplied by the step 
size, then we can find the optimum value of n. We must 
keep the maximum force fixed so as to avoid the insta- 
bility in the integrator, so we may increase the integra- 
tion step size to St' such that nK{Aiy^"6T' — k{M)St, 
where we have used the fact that = [k{M)]^^". 

At constant trajectory length and acceptance rate, and 
hence constant autocorrelation times, the cost of an 
HMC trajectory is proportional to the ratio n/Sr' , and 
thus is minimized by choosing n so as to minimize 
uSt/St' — which leads to the condition 

n^pt = ilnK(Al), corresponding to cost reduction by a 
factor oinSr/dT' = [e\-aK{M)f / [4,k{M)]. 

Our method is to apply the Rational Hybrid Monte 
Carlo (RHMC) algorithm to generate gauge field and 
pseudofermion configurations distributed according to 
the probability density 

P(C/, </.!,..., 0„) = ^ exp [~S,iU) - S,{M,n)] 



where Si{M,n) = E"=i </']-^"^^"</'i- Optimal Cheby- 
shev rational approximations are used to evaluate the 
matrix functions, and we proceed as we would for con- 
ventional HMC U- If written in partial fraction form 
r{M) = OLk/{M. + (3k), the rational function can 

be evaluated using a multi-shift solver 0. The re- 
sulting computational cost per pseudofermion field very 
similar to HMC ,20] since the shifts Pk are all positive, 
the most costly shift being that which is closest to zero. 
Remarkably, all the coefRcients are also positive, so 
the procedure is numerically stable. 

At this point it is worth comparing our method to 
the multiple pseudofermions through mass precondition- 
ing method, or the so called Hasenbusch trick 0. In 
the latter, the fermion determinant is written det Ai = 
det det[A^/A^], where the mass parameter used in 
Ai is larger than that in A4. The original idea behind 
this method was to tune the mass of the dummy op- 
erator M such that the operators had a similar condi- 
tion number 0- An increase in step size would then 
be possible for the same reasons given above. The ad- 
vantage with this method compared to RHMC is that 
the extra operators introduced are heavier, and hence 
cheaper to evaluate compared to the original kernel. Re- 
cently, larger speedups have been found through tuning 
the dummy mass(es) such that the action constituents 
with the greatest forces are those which are cheapest to 
evaluate, i.e., the heaviest j^]. A multi- level integration 
scheme 3 is then used which evaluates the cheaper and 
dominant forces more frequently than the more expen- 
sive and smaller forces. Tuning the mass parameters 
with both of these methods requires some effort, and even 
more so as further dummy operators are introduced. This 
compares to our RHMC method which requires no tuning 
of the extra operators since all operators are identical. 

RHMC has the added virtue in that it allows the in- 
clusion of less fermions than are described by the kernel 
Ai (typically this represents two fermions). For example 
to simulate full QCD, we are required to include the con- 
tribution from the light quark pair (at present we always 
assume niu — rud) and the strange quark. Traditionally 
the inclusion of the strange quark has proved problem- 
atic, and the use of an inexact algorithm |10| has been 
required. An alternative has been to use polynomial ap- 
proximations, but such an approach requires either a very 
large degree polynomial (> 0(1000) for light quarks), or 
a correction step which is applied with the acceptance 
test or when making measurements RHMC al- 

lows the strange quark to be included simply through 
the use of the rational approximation V At ~ r(A4), and 
because of the high accuracy of rational approximations, 
no correction step is required. 

An important observation that has been made with 
prior use of RHMC is that the derivatives of the partial 
fractions have vastly different magnitudes Fig" 
ure n we show the variation of magnitude of the force 
with each partial fraction, in order of increasing shift. 
The surprise is that the smallest shifts contribute least 
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FIG. 1: The variation of the force magnitude {L2 norm), 
conjugate gradient (CG) cost and parameter for each par- 
tial fraction with n = 2, 3 pseudofermions (24^ x 32 lattice, 
P = 5.6, K = 0.15800, CG residual = 10"*^ for all shifts). 



FIG. 2: The variation of SH with step size for n = 1, 2, 3, the 
integrating step size has been fixed at approximately St = 
0.01 (Wilson gauge action. Staggered fermions, 16* lattice, 
P = 5.76, m = 0.005). 



to the total fcrmion force. Also included on the plot is 
the number of conjugate iterations required in the multi- 
shift solver to reduce the residual for each shifted system 
by six orders of magnitude relative to the source. The 
most expensive constituents of the fermion force actually 
have the smallest magnitude. This effect may in part 
due to the fact that the density of states of the Dirac 
operator is greatest in the bulk, but principally because 
of the nature of the rational coefhcients. This latter ef- 
fect is enhanced as n is increased because the coefficients 
Qffc become smaller for light shifts, and larger for heavier 
shifts (see Figure P). 

We can make use of this observation by two methods. 
In the spirit of @ we can construct a multi-timescale nu- 
merical integrator that assigns a larger step size to the 
small shifts, the ratio of the two step sizes being chosen 
such that the product F6t is the same. In practice we 
have found the simpler approach where the smaller shifts 
are given a looser stopping condition than the heavier 
shifts while keeping the step size the same for all shifts, 
to be a more effective approach. It is important to men- 
tion that loosening the stopping condition of the poles 
has no effect on the reversibility of the molecular dynam- 
ics |3|. Here we are loosening the stopping condition of 
the smaller shifts which are less important for evaluating 
the total fermion force. 

To test our hypothesis that the use of multiple pseudo- 
fermions removes the integrator instability, we produced 
the data shown in Figure |21 using a relatively modest lat- 
tice volume. On a logarithmic scale we plot the value of 
{SH^)^^^ versus the step size for n = 1,2 and 3 pseudo- 
fermions. We used a multi-timescale integrator to isolate 
the effect of the fermions from that of the gauge action. 
For n = 1 when the step size reaches St = 0.066, SH 
explodes by four orders of magnitude, this corresponds 
to the value for which the instability is triggered. For 
n = 2 and 3 not only is energy conservation better, but 
also the instability has been removed. 

In conventional HMC the second order leapfrog inte- 



grator is usually found to be optimal, and the use of 
higher order integrators are found to decrease perfor- 
mance. Higher order integrators are more susceptible 
to the instability discussed before because they are con- 
structed from longer sub-steps than the original leapfrog 
step. If multiple pseudofermions bring the instability un- 
der control, higher order integrators should now prove ad- 
vantageous; this is indeed found to be the case. We have 
tried a variety of fourth order integrators: the fourth 
order Campostrini integrator and fourth order mini- 
mum norm integrators . While these all help we have 
generally found the fourth order minimum norm integra- 
tor (4MN5FV) to be optimal. Initial investigation has 
not found any gain from using a sixth order integrator, 
though this (or an even higher order integrator) should 
not be ruled out from future investigation. 





n 




/nrc 


Integrator 


T 


St 


0.15750 


2 


11 


15 


2MN 


1.0 


0.1 


0.15800 


3 


12 


16 


4MN5FV 


2.0 


0.25 


0.15825 


3 


12 


16 


4MN5FV 


2.0 


0.25 



TABLE I: Table of parameters used for this study. The 2MN 
integrator is the second order minimum norm integrator ^^■ 
For the MD the smallest two shifts had a CG residual of 10~ , 
10^^ for the next two, and 10~® for the remaining shifts. For 
all shifts the CG residuals were set to 10"^" for the heatbath 
and action evaluations. 

To test the efficiency of our final algorithm we choose 
the sameparameters that have been used in recent publi- 
cations namely a 24'^ x 32 lattice, using a Wilson 
gauge action (/? = 5.6) together with unimproved Wilson 
fermions with three different mass parameters. 

A summary of our results compared to multi-timescale 
mass preconditioning Q and conventional HMC can 
be seen in Table |H] Our algorithm is very similar in 
performance to that presented in 1^, and both of these 
algorithms are clearly superior to conventional HMC as 
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17.36 


128.000 


0.15825 


0.911(12) 


4.7(6) 


11.2 


52.50 


56.50 





TABLE II: Table of results comparing RHMC with conven- 
tional HMC and the multiple timescale mass precondition- 
ing presented in 8]. Our measure of cost is the product of 
the integrated autocorrelation of the plaquette Tpia, with the 
number of Dirac operator applications per trajectory A^'^v. 



expected. The results are also comparable with those 
of Liischer , but are far easier and more efficient to 
implement especially on fine grained parallel computers. 



The benefit that is gained from the improved HMC al- 
gorithms increases as the quark mass is reduced, and in 
this regime it would be interesting to further compare 
these algorithms. As the lattice volume is increased, we 
expect our method to prove more advantageous because 
of the improved volume dependence of higher order inte- 
grators (with a second order integrator, the cost of HMC 
is expected to scale V^^'^, whereas with a fourth order 
integrator the cost is expected to scale V^/^ 

The importance of these results is that the cost to 
generate gauge field configurations with light fermions, 
which is the most costly part of lattice field theory com- 
putations, has been drastically reduced. This corre- 
sponds to more than a four fold decrease in computer 
time. These techniques also promise to lead to similar 
improvements in other fields where pseudofermion tech- 
niques are used for fermionic Monte Carlo computations. 



Conclusions 

In this letter we have presented a simple improvement 
to the HMC algorithm to reduce the computation re- 
quired for Lattice QCD calculations. This method leads 
to a large gain in performance relative to the conven- 
tional algorithm. At the physical parameters analyzed, 
our method is competitive with the mass preconditioning 
method presented in moreover, the use of the RHMC 
algorithm permits the easy introduction of single quark 
flavors. In practice it is often advantageous to combine 
mass preconditioning with the present n"' root method. 
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